Introduction

The goal of this study is to examine the impact of certain variables on the climate by examining the AQI of counties across the United States of America using data collected by the EPA.

There are two smaller sub studies in this presentation: One examining the effects of the Climate Alliance legislative program, and another examining the correlation between aspects of counties and the air quality.

Reading the Data and EDA

To begin we read the data in from the EPA datasets.

## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## [1] 85.3

Climate Alliance

## 
## Call:
## lm(formula = med.aqi ~ is.climate.alli + state, data = focus_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36.82  -3.06   1.47   5.68  94.52 
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          37.5294     1.8189   20.63  < 2e-16 ***
## is.climate.alliyes   -1.1008     2.3058   -0.48  0.63312    
## stateAlaska         -17.0000     3.1503   -5.40  7.6e-08 ***
## stateArizona          6.3167     2.7630    2.29  0.02235 *  
## stateArkansas        -1.7294     2.7942   -0.62  0.53603    
## stateCalifornia       9.8827     1.7521    5.64  1.9e-08 ***
## stateColorado         0.3419     1.9628    0.17  0.86172    
## stateConnecticut      2.1339     3.0064    0.71  0.47792    
## stateDelaware         4.7381     4.5558    1.04  0.29846    
## stateFlorida         -2.3371     2.1795   -1.07  0.28371    
## stateGeorgia         -0.8570     2.2908   -0.37  0.70836    
## stateHawaii          -8.4286     4.0086   -2.10  0.03562 *  
## stateIdaho          -12.5294     2.4467   -5.12  3.3e-07 ***
## stateIllinois        -1.1323     2.0227   -0.56  0.57570    
## stateIndiana         -3.4669     2.1712   -1.60  0.11048    
## stateIowa            -1.2169     2.6121   -0.47  0.64136    
## stateKansas          -5.2112     2.9019   -1.80  0.07267 .  
## stateKentucky         0.5261     2.3219    0.23  0.82076    
## stateLouisiana       -2.4068     2.1104   -1.14  0.25423    
## stateMaine           -2.0786     2.7627   -0.75  0.45192    
## stateMaryland         1.3950     2.3058    0.60  0.54527    
## stateMassachusetts   -0.2986     2.7630   -0.11  0.91394    
## stateMichigan        -1.5536     2.0043   -0.78  0.43836    
## stateMinnesota       -5.7857     2.1649   -2.67  0.00759 ** 
## stateMississippi     -0.0056     2.9435    0.00  0.99848    
## stateMissouri        -1.0502     2.3773   -0.44  0.65870    
## stateMontana         -9.6610     2.5036   -3.86  0.00012 ***
## stateNebraska       -11.4739     3.0915   -3.71  0.00021 ***
## stateNevada          -0.9286     2.8736   -0.32  0.74662    
## stateNew Hampshire   -2.8151     3.3679   -0.84  0.40332    
## stateNew Jersey       2.5089     2.3502    1.07  0.28586    
## stateNew Mexico      -4.2411     2.3502   -1.80  0.07130 .  
## stateNew York        -4.9608     1.9552   -2.54  0.01125 *  
## stateNorth Carolina  -0.4549     1.8678   -0.24  0.80761    
## stateNorth Dakota    -3.2794     2.9887   -1.10  0.27265    
## stateOhio            -2.4104     2.1558   -1.12  0.26365    
## stateOklahoma        -2.5702     2.3672   -1.09  0.27772    
## stateOregon          -8.5350     2.0980   -4.07  4.9e-05 ***
## statePennsylvania     0.9294     1.8432    0.50  0.61412    
## stateRhode Island    -0.4286     4.5558   -0.09  0.92506    
## stateSouth Carolina  -1.5016     2.5363   -0.59  0.55387    
## stateSouth Dakota    -5.8794     2.9887   -1.97  0.04929 *  
## stateTennessee       -1.9425     2.3986   -0.81  0.41814    
## stateTexas           -4.8593     2.1137   -2.30  0.02161 *  
## stateUtah             5.7039     2.6566    2.15  0.03191 *  
## stateVermont         -7.0536     4.0086   -1.76  0.07862 .  
## stateVirginia        -6.9139     1.9138   -3.61  0.00031 ***
## stateWashington     -10.7892     1.9628   -5.50  4.4e-08 ***
## stateWest Virginia   -8.3732     2.6121   -3.21  0.00137 ** 
## stateWisconsin            NA         NA      NA       NA    
## stateWyoming          1.6650     2.5363    0.66  0.51158    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.6 on 2029 degrees of freedom
## Multiple R-squared:  0.171,  Adjusted R-squared:  0.151 
## F-statistic: 8.55 on 49 and 2029 DF,  p-value: <2e-16

Climate Alliance states tend to have a better AQI on average but it is not significant.

This might be because the Climate Alliance only went into effect 3 years ago in 2017.

County Level Effects on AQI

Using the data found by the USDA’s Economic Research Service, we look for predictors in counties to determine air quality and find correlations. This begins by merging the 2019 AQI with the latest USDA ERS data. We use 2019 data to avoid skewing due to the 2020 West Coast fires.

To begin the analysis, we start by merging county data with AQI data. We start by merging all three sets of ERS county data, and then we merge by county and state.

We only take the data from year 2019 to keep it consistent. We are avoiding using 2020 data due to the fires on the West coast skewing data.

Break the cleaned and merged dataset into X and Y for use with cv.glmnet. We use set.seed(1) for consistency.

## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(select_cols)` instead of `select_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Anova Table (Type II tests)
## 
## Response: med.aqi
##                      Sum Sq  Df F value  Pr(>F)    
## state                 15670  48    3.44 2.5e-13 ***
## PctEmpAgriculture       109   1    1.15  0.2848    
## PctEmpConstruction      174   1    1.83  0.1761    
## PctEmpFIRE              734   1    7.73  0.0055 ** 
## Age65AndOlderPct2010     50   1    0.53  0.4676    
## Ed4AssocDegreePct       774   1    8.16  0.0044 ** 
## FemaleHHPct            1681   1   17.71 2.8e-05 ***
## HH65PlusAlonePct        578   1    6.09  0.0138 *  
## Ed3SomeCollegeNum       737   1    7.77  0.0054 ** 
## ForeignBornMexNum       610   1    6.43  0.0114 *  
## NetMigrationNum0010    1698   1   17.90 2.6e-05 ***
## Residuals             89962 948                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the Anova call above, we see that Age65AndOlderPct2010 is the least relevant, so we remove it.

## Anova Table (Type II tests)
## 
## Response: med.aqi
##                     Sum Sq  Df F value  Pr(>F)    
## state                15623  48    3.43 2.9e-13 ***
## PctEmpAgriculture       92   1    0.97  0.3246    
## PctEmpConstruction     143   1    1.50  0.2205    
## PctEmpFIRE             723   1    7.62  0.0059 ** 
## Ed4AssocDegreePct      744   1    7.84  0.0052 ** 
## FemaleHHPct           1652   1   17.41 3.3e-05 ***
## HH65PlusAlonePct       950   1   10.01  0.0016 ** 
## Ed3SomeCollegeNum      732   1    7.72  0.0056 ** 
## ForeignBornMexNum      618   1    6.52  0.0108 *  
## NetMigrationNum0010   1683   1   17.74 2.8e-05 ***
## Residuals            90012 949                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the Anova call above, we see that PctEmpAgriculture is the least relevant, so we remove it.

## Anova Table (Type II tests)
## 
## Response: med.aqi
##                     Sum Sq  Df F value  Pr(>F)    
## state                16002  48    3.51 8.3e-14 ***
## PctEmpConstruction     124   1    1.31 0.25270    
## PctEmpFIRE            1037   1   10.93 0.00098 ***
## Ed4AssocDegreePct      685   1    7.22 0.00732 ** 
## FemaleHHPct           1667   1   17.58 3.0e-05 ***
## HH65PlusAlonePct      1046   1   11.03 0.00093 ***
## Ed3SomeCollegeNum      786   1    8.29 0.00408 ** 
## ForeignBornMexNum      614   1    6.47 0.01112 *  
## NetMigrationNum0010   1704   1   17.96 2.5e-05 ***
## Residuals            90104 950                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the Anova call above, we see that PctEmpConstruction is the least relevant, so we remove it.

## Anova Table (Type II tests)
## 
## Response: med.aqi
##                     Sum Sq  Df F value  Pr(>F)    
## state                16606  48    3.65 1.1e-14 ***
## PctEmpFIRE            1127   1   11.88 0.00059 ***
## Ed4AssocDegreePct      733   1    7.73 0.00555 ** 
## FemaleHHPct           1974   1   20.81 5.7e-06 ***
## HH65PlusAlonePct      1139   1   12.01 0.00055 ***
## Ed3SomeCollegeNum      814   1    8.58 0.00348 ** 
## ForeignBornMexNum      582   1    6.13 0.01347 *  
## NetMigrationNum0010   1679   1   17.69 2.8e-05 ***
## Residuals            90228 951                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##                      Estimate Std. Error t value Pr(>|t|)
## PctEmpFIRE           6.13e-01   1.78e-01    3.45 5.91e-04
## Ed4AssocDegreePct   -5.29e-01   1.90e-01   -2.78 5.55e-03
## FemaleHHPct          5.35e-01   1.17e-01    4.56 5.73e-06
## HH65PlusAlonePct    -4.73e-01   1.36e-01   -3.47 5.53e-04
## Ed3SomeCollegeNum    1.42e-05   4.86e-06    2.93 3.48e-03
## ForeignBornMexNum    2.15e-05   8.67e-06    2.48 1.35e-02
## NetMigrationNum0010  2.78e-05   6.61e-06    4.21 2.84e-05

From the final model, we see that most of the impact on AQI is geographical. For example, the increase from ForeignBornMexNum and NetMigrationNum could signal that states closer to the Mexican border tend to have worse AQIs due to their location. However, the most clear predictors are the states themselves.

The assumptions for linearity appear to hold up until about 1 standard deviation below the mean.

## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin

Conclusion

The overall objective of this study was to use the AQI of counties across the USA to determine the impact of variables on the climate. Using data collected by the EPA, we were able to focus on the effect of the Climate Alliance on curbing the deterioration of the AQI across the nation, as well as the correlation between aspects of counties and their air quality.

From this study, we were able to conclude that the Climate Alliance has not had much of an effect yet on the AQI of member states, but do have better AQIs on average compared to other states. We were also able to see that most of the impact on the AQI is geographical based on the significant variables of the model.